Abstract. The assumption that a parametric class of functions 
fits the data structure sufficiently well is common in fitting 
curves and surfaces to regression data. One then derives a pa- 
rameter estimate resulting from a least squares fit, say, and in 
a second step various kinds of goodness of fit measures, to 
assess whether the deviation between data and estimated sur- 
face is due to random noise and not to systematic departures 
from the model. In this paper we show that commonly-used y^- 
measures are invalid in regression models, particularly when 
inhomogeneous noise is present. Instead we present a bootstrap 
algorithm which is applicable in problems described by noisy 
versions of Fredholm integral equations, of the first kind. We 
apply the suggested method to the problem of recovering the 
luminosity density in the Milky Way from data of the DIRBE 
experiment on board the COBE satellite. 
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1. Introduction 

Regression problems arise in almost any branch of physics, in- 
cluding astronomy and astrophysics. In general, the problem 
of estimating a regression function (or surface) occurs when 
a functional relationship between several quantities of interest 
has to be found from blurred observations {yi,ti), i = 1, ■ ■ ■ ,n. 
Here y — (yi, • • • , yn) denotes a vector of measurements (re- 
sponse vector) and t — (ti, • • • , t„) a quantity which affects 
the response vector in a systematic but blurred way , which is 
to be investigated. This systematic component is usually de- 
noted as the regression function E[Yi] — uj{ti). Note that Yi 
is a random variable, of which yi is a realisation. If G M, 
this includes signal detection problems or image restoration if 
e M^. Many problems bear the additional difficulty that the 
quantity of interest is not directly accesible to the observations 
y and the relationship has to be expressed by a noisy version of 
a Fredholm integral eq. of the first kind, viz. 



io{ti) + ei = {Kp){U) + i 



(1) 



where K is a given integral operator, p the regression func- 
tion to be reconstructed and e = (ei, • • • , e„) a vector of inde- 
pendent random quantities (error), due to imprecise measure- 
ments and other sources of noise. More precisely, we assume 
that the expectation of yi is given by (Kp)(ti) and inhomoge- 
neous noise might be present, i.e. the variance af of the noise 
(and possibly higher moments, too) depends on the grid point 
ti. There is a vast amount of literature concerning statistical 
theory for the estimation of p, we mention only Wand & Jones 
(1995) for direct regression and Nychka & Cox (1989) or van 
Rooij & Rymgaart (1996) for the inverse (sometimes denoted 
as indirect) case, as in eq. ([l|). (Inverse) regression models cap- 
ture various examples from astronomy and physics (cf. Bertero 
( |1989[ ) or Lucy ( |1994ij , |1994b| ) for an overview). Such an ex- 
ample is the reconstruction of the three-dimensional luminosity 
in the Milky Way [MW], which will be discussed extensively 
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in sect. || In this example, p will be a three-dimensional den- 
sity of the MW, K the operator that projects this density to the 
sky, Kp(ii) the resulting surface brightness at the sky position 
ti = {I, b)i and yi the observed surface brightness at {I, b)i. 

Reconstruction procedures (estimation) of p in general depend 
on various a priori assumptions about p, such as smoothness 
properties or geometrical constraints, e.g. monotonicity. The 
most common assumptions are that p has a particular structure 
and shape, depending on some unknown parameter d. Such an 
assumption is denoted as a parametric model. Typically, these 
strucutural assumptions arise from physical reasoning and ap- 
proximation procedures. Often, however, it is not completely 
clear whether these assumptions are satisfied and therefore it 
is an important task to investigate empirically (by means of the 
data at hand) whether the resulting model is valid. Therefore, in 
this paper we discuss recent methodology for the investigation 
of the adequacy of such a parametric model U = {p^}^£e, 
Q C This will be done for regular regression problems as 
well as for the inverse case, as in ([l|). 

The paper is organized as follows. In the next sect, we briefly 
review common practices to judge the goodness of fit of 
a model U. It is shown that classical goodness of fit ap- 
proaches, such as least square statistics are insufficient from 
many methodological points of view, particulary when inho- 
mogeneous noise is present, i.e. the variation erf of the error Ei 
is expected to vary with the grid point (covariate) ti. We show 
in section 2 that statistically valid conclusions about the good- 
ness of fit from the residuals — J2 {Vi ~ i^P§)i'^i)) 
(or variants of it) are impossible in general, particularly when 
inhomogeneous noise is present, as is the case in our data ex- 
ample. This is mainly due to the fact that in the inhomoge- 
neous case the distribution of ^ depends on the whole vec- 
tor (cr^, . . . , (T,^j) which is in general unknown. Therefore, we 
suggest in sect. 3 a measure of fit which is based on "smoothed 
residuals" and which allows for the calculation of the corre- 
sponding probability distribution. In sect. ^, a bootstrap resam- 
pling algorithm is suggested which allows the algorithmic re- 
construction of the distribution of the suggested goodness of fit 
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quantity. The use of bootstrap techniques is well documented 
in astronomy (cf. Barrow et al. (1984), Simpson and Mayer 



( 1986), van den Bergh and Morbey (1984) for various applica- 



2) we can quantify the probability that ^ exceeds some "crit- 
ical value" in order to obtain a precise probabilistic analysis 
(computing significance levels, confidence intervals, etc.)- 



tions). The work similar in spirit to ours is Bi & Borner's (1994 ) 



residual type bootstrap, used as a method for nonparametric 
estimation in inverse problems. As a byproduct we show, how- 
ever, that this residual bootstrap is insufficient in the case of 
inhomogeneous noise in the data and a so-called 'wild' boot- 
strap has to be used instead. 

Finally we will apply our new method in sect. 5 to the fit of 
the COBE/DIRBE L-band data. We use a functional form for 
a parametric model of the MW as presented by Binney et al. 
( 1997 hereafter BGS) and find similar structural parameters of 
the Milky disk and bulge, except for the scale height of the disk 
which we find to be about 25% smaller. 

2. Common methods of judging the quaUty of fit 

One of the most popular techniques for finding a proper fit of a 
given model J7 to a given set of data yi , • • • , j/n is to minimize 
a (penalized) weighted sum of squares 

n 

i=i 



As a rough rule of thumb often 

1 o _ 



(2) 



is taken as a measure of evidence for the model U and hence 
for the fit of uj^ . Here d denotes the dimension (number of 
parameters) of U and n the number of data points. Examples of 
the use of this kind of statistics can be found in Alcock et al. 



(1997) and in Dwek et al. (1995) in the context of discriminat 



ing between several models. A related well-known quantity is 
the sum of squares of 'expected minus observed divided by ex- 
pected' for testing distributional assumptions, such as normal- 
ity of the data (cf. Cox & Hinkley |1974| ). Bi & Borner ( |1994 
considered a similar quantity in a deconvolution setup which 
is, using the notation of (|l]) 

2 

(3) 



This obviously downweights the influence of residuals if the 
corresponding predicted value uj^{ti) is large. Another option 
is to consider the absolute deviation of the predicted and ob- 
served values, which leads to a more robust version of x^, or 
even mor e gen eral dista nce m easur es can be use d (Cook & 



Weisberg 



1999 



Hocking 



1996 



Lucy 



1994£ 



1994b). In thefol- 



where the Wi denotes some weighting scheme and the model is 
Wfl(^i) = (Kp^)(ti). This leads to a weighted least squares es- 
timator (WLSE) of the optimal model parameter, dw. However, 
it is well known that a proper choice of the weights wi, . . . ,Wn 
depends on the (possibly position-dependent) random noise in 
the data. For example, under an uncorrected normal error as- 
sumption, if the variance of the error Si is assumed to be 
known, a suitable choice of weights is Wi = cr^'^ in order to 
take into account the local variability of the observations at 
the grid point ti. Particularly, in this case, the ordinary, un- 
weighte d leas t squares estimator is known to be insufficient 
(Ganant [l987| ), because the log likelihood of the model is pro- 
portional to Q'^ji'd). Only if the variance pattern is homoge- 
neous (i.e. af = 0-2) are unweigthed least squares estimators a normal distribution of the data) = E vf 

i—l 

as a sum of normally distributed variables having the expecta- 



lowing we will argue that an approach like is not valid in 
regression models such as (0), particularly when the noise is 
inhomogeneous or the residuals are not gaussian. To this end 
we briefly discuss the (asymptotic) distribution of the above- 
mentioned quantites. 

In order to get a first insight into the probabilistic behaviour 
of statistics such as x^ , used as a quantitative measure of fit, 
it is helpful to consider the distribution in the simplest case 
when cj = 0. A simple calculation then shows that (assuming 

is distributed 



optimal. The weighted least squares approach is, however, lim- 
ited if the the local variances (t| in the data points are unknown. 
The erf then have to be estimated from the data. This is often 
neglected. It is also common practice to consider Q"^ in or- 
der to judge the quality of fit achieved by the regression func- 
tion (where the weights Wi may sometimes be different to those 
used in computing the WLSE). Here, a 'large' value of Q"^ is 
used as an indicator for a 'significant' deviation between the 
observations and the model to be fitted. We wiU investigate this 
in more detail in what follows, and emphasize the case of non- 
homogeneous variances. 

In general, the most important properties required of any good- 
ness of fit (GoF) quantity x^ such as x^ are that 

1) we are able to detect with high probability deviations from 
the model we have in mind (often denoted by statisticians 
as good 'power'). 



tion E[x^] = J2 '^i' ™d variance V[x^] = 2 ^ erf. Hence, 

i=l i=l 

already in this simple case it can be seen that the determina- 
tion of the law of x^ is practically impossible if the variances 
af are not known. Then it is difficult to quantify what a "too 
large value of x^" means, because this will depend on the un- 
known quantities af,...,a^, and a rule as in (^ can lead in 
principle to any result in favour or against the model uj = 0. 
We mention that standardisation by the predicted values as in 
(^ does not avoid this problem. This is in contrast to goodness 
of fit problems for the assessment of distribution assumptions, 
i.e. when one investigates by a x^ measure whether a popula- 



tion is normal, say (Cox & Hinkley 1974). Note, that the case 



of homoscedastic regression models (i.e. the distribution of the 
noise is identical for all data points) is somewhat simpler, be- 
cause here the expectation £'[x^] = na^ and the square root 
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of the variance Vlx^] 
noise ratio 



2ncr'* is proportional, i.e. the signal to 



only depends on the number of data points n. Here, a model- 
free estimator of can be used as a reference scale (Hart 



1997) 



Many attempts were made in order to find simple approxi 
mations of the distribution for xt- Among them a quite at 
tractive option is use of a bootstrap method, an algorithmic 
approximation of the true law (see Efron & Tibshirani 1993 
for an overview and many applications). Bootstrapping ran- 
dom quadratic forms (such as x^) is, however, a rather delicate 
matter, because standard bootstrap algorithms such as Efron's 
( 1 979 ) n out of n bootstrap are inconsistent (Babu ( 1 984 ), Shao 
& Tu (1995)), i.e. the distribution is not approximated correctly 



with increasing number of observations. 

The use of a particular bootstrap algorithm was indeed sug- 
gested by Bi & Borner (1994) in the context of assessing the 
goodness of fit in deconvolution models. We mention that their 
bootstrap algorithm, however, is asymptotically not correct in 
inhomogeneous models. Interestingly, the suggested algorithm 
is similar in spirit to the so called residual bootstrap (i.e. 
drawing random samples with replacement from the residuals 
r.i) which is well documented in the statistical literature (cf 



Davison & Hinkley (1997) p.281) for the estimation of the 
regression parameters). 

Despite the abovementioned difficulties, the main problem en- 
countered with the naive use of in regression models as a 
measure of GoF is that asymptotically (here and in the follow- 
ing, asymptotically means the sample size tends to infinity) the 
law of x^ does in general not converge asymptotically to any 
reasonable quantity, in contrast to goodness of fit testing for 
distributional assumptions. Even after rescaling by 1/ y/n in or- 
der to force the variance 



v[i/V^x' 



i=l 



\t)H{dt) 



to converge (here it is assumed that the scheme of grid points 
can be described asymptotically by a distribution H (Dette & 



Munk 1998 1) gives 



E[l/V^X^ 



i=l 



) = o(V^) 



which shows that l/yn x does not converge to any reason- 
able quantity. Note that if we use x^/'^ the variance tends to 
zero. Also observe that subtracting E[l/^/n x^\ from x^ will 
not provide a way out of the dilemma because this value de- 
pends on the (unknown) variances af. 

In summary, we see that without explicit knowledge of the vari- 
ances cr|, the use of x^ as a quantitative measure of validity of 
a model is not appropriate. 



Due to the above-described difficulties, statisticians through- 
out the last two decades have extensively studied the problem 
of checking the goodness of fit in regression models. It is be- 
yond the scope of this paper to review this work; many refer- 



ences can be found in the recent monograph by Hart (1997) 



Among the variety of procedures suggested so far, we mention 
methods which are based on model selection criteria, such as 



Akaike's (1974) information criterion (Eubank & Hart (1992), 



Aerts et al. (1999)) and methods which compare nonparamet- 



ric estimators w ith a p arametric estimator. To thi s end Azza- 
l ini & Bowman ( |l993| , Hardle & Mammen (| l993 [) and Muller 
(1992) used a kernel estimator. Co x et a l. (1988) smoothing 
splines and Mohdeb & Mokkadem ( 1998 ) a Fourier series es- 
timator. However, the applicability of many of these methods 
is often limited. For example, Hardle & Mammen's test is con- 
fronted with bias problems, whereas other procedures are only 
applicable for homogeneous errors or when the er ror distribu- 
tion is completely known (Eubank & Hart 1992 Aerts et al. 



1999). Another serious difficulty arises with the nonparametric 



estimation of the signal as the dimension k of the grid points 
increases. This is sometimes denoted as the curse of dimen- 
sionality (Wand & Jones 1995, Bowmann & Azzalini 1997). A 



rough rule of thumb is that the number of observations required 
in dimension k is tl^ in order to obtain the same precision of 
the estimate of lo. Hence, the precision induced by 100 obser- 
vations on the real line is approximately the same as 10,000 
drawn from the plane. Furthermore, measurements often can- 
not be taken equidistantly over a grid, which leads to sparse 
data structures causing further difficulties with increasing di- 
mension. One should also note that another difficulty consists 
of transferring these methods to the case of inverse problems, a 
situation which up to now has never been treated. 



3. A new method 

Munk & Ruymgaart (1999) have developed a general regres- 
sion methodology which remains valid in the heteroscedas- 
tic case (i.e. the distribution of the noise depends on the data 
point) with arbitrary dimensions of the grid points. The under- 
lying idea dates back to H.Cramer and can be summarized as 
"smoothing the residuals" in order to obtain asymptotical stabi- 
lization of the test criterion. In our context this reads as follows. 
Let T denote an injective smoothing linear integral operator 
with associated integral kernel r(-, •), i.e. 



T[/](^) 



T{u,t)f{t)dt. 



(4) 



Note that since T is an integral operator, T[/] is again a func 
tion. Now consider the transformed distance between the para 
metric model V — {w^ij^ee and the distribution uj, which un 
derlies the observations yi — uj{ti) + Si (cf. sect. 1), 

D\g)^D\T[L,])^min\\T[u-u^]f 



(5) 



where g — T[cj] denotes the smoothed version of lo and the 
norm ||.|| refers to some L^-norm to be specified later on. 
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The smoothed distance serves now as a new measure of 
goodness of fit and has to be estimated from data. This will 
be done by numerical minimization of the empirical counter- 
part of the r.h.s. of (||), ||.g — = xi)!'^) where g{u) = 

n 

1/n J2 Vi T{u,ti) denotes an estimator of g{u). In addition 

i=l 

this provides us with a smoothed estimator ??t of the value 
for which the minimum in (5) is achieved. In Munk & Ruym- 
gaart (1999) the kernel 

T(u,t) — min(u,t), 

was suggested (see Appendix Al), which will also be used in 
the following, and which amounts to a cumulative smoothing. 
Not that for fc-dimensional u and t the minimum has to be un- 
derstood componentwise as 

k 

min(u,t) = J^min(uj,ti) 

i 

We mention that other choices of T are possible (cf. Sect. Q). 

The reasoning behind this approach is that direct estimation of 
w is a rather difficult task, whereas estimation of the smoothed 
transformation g — T(a;) is much simpler Furthermore, the 
distribution of the minimizer of x^jji'd) becomes tractable. If 
we denote the minimum of 1 1 5 — 1 1 ^ as l)^ one can show un- 
der very mild regularity conditions that (Munk & Ruymgaart, 
1999) the distribution of nD^ converges to 



00 



(6) 



where the Ai denotes a decreasing sequence of positive num- 
bers which depend on the best model parameter d^, which 
is the minimizing d in (^, the model space U and the un- 
known distribution of the error e, including the variance func- 
tion a'^{ti). This makes a direct application of this limit law dif- 
ficult and hence a peculiar bootstrap algorithm is suggested in 
the following which can be shown to be asymptotically consis- 
tent, i.e. the asymptotic limit law of this algorithm is the same 
as in (^. The following idea of the so called "wild bootstrap" 
dates back to Wu ( |1986 ) and was applied by Stute et al. ( |199S| ) 
to a testing problem similar to the one above. 



4. The wild Bootstrap Algorithm 

The true distribution (6) of depends on the unknown Ai. It is 
therefore not possible to use this distribution for practical pur- 
poses. However, it is possible to approximate the distribution 
numerically using the following bootstrap algorithm: 

Step 1: {Generate residuals). Compute residuals 



1, 



where iJt denotes a solution of the minimization of 

:=x'(^?t) :=min||g-T^^||2. 



CV6+1)/<2V5) 



(V5-1)/<2V5) 



-(v'5+1)/2 



(V5+1)/2 



Sample space 



Fig. 1. Binary probability distribution required in step 2 of the 
wild bootstrap algorithm. 

Step 2: {The "wild" part). Generate new random variables 
c*, i = 1, . . . ,n, which do not depend on the data, where 
each c* is distributed to a distribution which assigns probabil- 
ity (V5 + l)/2V5tothe value (-\/5-l)/2 and (\/5-l)/2\/5 
to the value (\/5 + l)/2. See fig. | for a visualization of this 
probability distribution. 

Step 3: {Bootstrapping residuals). Compute e* := iiC* and 

Ut = ^^.j, + £*■ This gives a new data vector (?/*, ii)j=i,...,"- 

Step 4: {Compute the target). Compute l)^' 

iVi I ti)i=l,...,n 



with 



Step 5: {Bootstrap replication). Repeat step 1-4 B times 
{B — 1000, say) which gives values D{ , . . . , . 



Now we construct the empirical cumulative distribution func- 
tion [ECDF], which can be taken as an approximation for the 
right side in (|^), because Munk & Ruymgaart (1999) have 
shown that the ECDF, based on D^ '*, asymptotically approx- 
imates the distribution of D^. The ECDF can be obtained by 



ordering the values of D^' 



, Dg increasingly and plot- 



ting them against the value (i) /B, where (/) denotes the posi- 



tion of D,: 



, D'^g-j ■ The so- 



in the ordered sample Z3^j^^ , . 
called estimated evidence of the model U can now be ob- 
tained by determining the position of the original statistic 



This is some number 



in the ordered sample Df[-^ , • • ■ , ^{b) 
k* £ {0, . . . , -B + 1}. From this number one computes 

a* = 1- k*/B. 

Statisticians denote a* as the p-value of the test statistic D^. 
The interpretation of this value is as follows. A small a* indi- 
cates that the observed data are very unlikely to have been gen- 
erated by model U, because the probability that the observed 
(or a larger value) occurs is very small, namely a* (recall 
that the bootstrap algorithm reproduces the true distribution of 
in (^)). On the other hand, if a* is large (and hence D"^ 
small) there should be rare evidence against a proper use of 
model U, because provides a good fit of the data to the 
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model compared to all other possible outcomes which could 
have occured. 

A formal test at significance level a can be performed when 
deciding against JJ if 



a < a. 



(7) 



In other words a small a* indicates that the deviation from the 
model U is not simply due to noise, but rather a systematic 
devation from the model V has to be taken into account. 

We would like to close this section by making some remarks 
about the applicability of bootstrap algorithms in the context of 
goodness of fit, and giving some arguments why our boqtstra] 
algorithm is valid in the heteroscedastic case. Stute et al. ( 199 
have shown that the wild bootstrap is valid in heteroskedastic 
models with random grid points t. This result can be extended 
to deterministic grid points, as is the case in our example, pro- 
vided the scheme is not 'too' wiggly (a precise formulation 



can be found in Munk (1999)), which holds true for the sub- 



sequent example. We mention that an explanation for the wild 
bootstrap validity is its automatic adaptivity to inhomogeneous 
variances, because it can be shown that the variance in the artif- 
ical datapoints y* induced by "wild" resampling (step 2 in our 
algorithm) yields 

which estimates approximately a^iti). In contrast, the n out 
of n bootstrap (cf. Stute et al. 1998) and the residual bootstrap 
here fail to hold because the bootstrap variance is in the lat- 
ter case i which approximates the average overall 
variance JJ a'^{ti,t2)dtidt2 in our example. This argument 
transfers essentially to any random quadratic form (such as 
or Bi & Borner's (1994) x^-statistic). The residual boot- 



strap is consistent only if the error is homoscedastic, which, 
however, in the subsequent example is not the case. The case 
when the model space U is of dimension oo, as considered 
by Bi & Borner (1994), is in principle similar; here it is also 
well known that the residual bootstrap is insufficient in het- 



eroscedastic models (Hardle & MaiTon 1991) 



5. Application: recovering the luminosity distribution in 
the Milky Way 

The DIRBE experiment on board the COBE satellite, launched 
in 1989, made measurements of the surface brightness in sev- 
eral infrared wavebands (Weiland 1994). A difficulty with the 
COBE/DIRBE data is that it has to be corrected against cer- 
tain effects. The most important correction is the removal of 
dust absorption. This has been done by Spergel et al. (1996). 



We use their corrected COBE/DIRBE L-band data in our fits. 
The resolution of the data are n x m = 120 x 40 points in 
I, b respectively, covering a range —89.25° < / < 89.25° and 
-29.25° < 6 < 29.25°. The points in this two-dimensional grid 
are equally spaced. 



The COBE/DIRBE data have been used to deproject the three- 
dimensional density of the MW in a number of projects. A main 
difficulty in recovering the three-dimensional luminosity dis- 
tribution from the two-dimensional surface brightness distribu- 
tion of the MW is that it is not a unique operation, in general. 
One way to avoid this problem is to fit a parametric model to 
the MW in order to reduce the set of possible models. Several 
parametric models have been suggested, see for example Kent 



et al. (1991), Dwek et al. (1995l or Freudenreich (1998). An 



other approach is to use the non-parametric Richardson-Lucy 
algor ithm for the d eproje ction of the dat a (Bin ney & Gerhard 



1996i Binney et al. |1997t Bissantz et al. |1997| ), in order to re 



construct the luminosity distribution of the MW. 

In parametric models of the MW density, about ten "structural 
parameters" - including normalisations, scale lengths and geo- 
metrical shape parameters of the bulge/bar - are used (see, for 
example, Kent et al. 



1991, Dwek et al. 1995 & Freudenreich 



1998). In what follows, we assume that these parameters are 



selected such that the projection of a model onto the sky is an 
injective operation. 



5.7. The basic eqs. of the astrophysical problem 

We will first derive a general mathematical model of the prob- 
lem of recovering the MW luminosity distribution from the L- 
band data. The projection of a three-dimensional light distribu- 
tion to a surface brightness (on the sky) is defined as follows. 
Let H be the set of possible luminosity densities of the MW, i.e. 
of maps 



p:R^-^M>o, p{x) 

and Vl be the surface brightness distributions 



[0,2^] 



2' 2 



>o, b) uj{l, b), 



where uj{l, b) is the surface brightness at sky position {I, b), and 
Lj E il. The transformation between a luminosity density p to 
its corresponding surface brightness distribution is described 
by a linear integral operator V. We will call this operator V the 
projection operator, since it "projects" a luminosity density on 
the sky, i.e. onto a surface brightness distribution. 



V{p) 



V is defined by the integral J p{x)dr of the density p{x) along 
the line-of-sight from the observer to infinity in direction [Ub). 
Let r denote the distance from the observer Note that the in- 
tegrand is p and not p x because the physical extend of the 
observed cone Sfl increases as whereas the intensity of a 
source decreases as and the r-powers therefore cancel out. 
Let s{r,l, b) be the path from the observer to infinity along the 
line-of-sight to {l,b), parametrized by the distance from the ob- 
server r. Then 

POO 

u;{l,b) = V{p){l,b)= / p{s{r,l,b))dr. 
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Galactic center 



problem is to reconstruct p from the noisy integral eq. 



/1=0° 



Gala^tocen0(/x "^^.^^^^^^Galactic longitude 1 

Galactocentric y 



Galactocentric z 
Galactic latitude b 



Bar angle 
Phi 



Observer 

Fig. 2. A sketch of the two coordinate systems that we use in 
this paper. Luminosity densities of the MW are defined in the 
galactocentric coordinate system y, z. Galactic longitude I 
and latitude h define a position on the sky. Together with the 
distance from the observer r they constitute the observer cen- 
tered coordinate system. 

So far we have used a spherical coordinate system centered at 
the observer. Coordinate axes are the sky longitude I and lati- 
tude b and the distance from the observer r. We now introduce a 
second coordinate system. Let (cc, y, z) be the coordinate axes 
of a cartesian, galactocentric, coordinate system, s.t. x and y 
lie in the main plane of the MW. We define x to be along the 



major axis of the bulge/bar (cf sect. 5.2 for further explanation 



of the components of our parametric MW model, including the 
bulge/bar), y along the minor axis, and z perpendicular to the 
main plane of the MW. Let us further call "bar angle $" the 
angle between the major axis of the bar/bulge and the line-of- 
sight direction from the observer to the galactic centre. The 
position of the observer in this coordinate system is denoted as 
(x0 , y©, zq). Fig. ^ depicts the two coordinate systems. 

In our setting, a parametric model of the MW is a class U of 
distributions p§ such that 

where Q denotes a set of parameters C M''. This defini- 
tion is in accordance with our terminology at the beginning of 
this paper. We cannot observe lo directly, due to measurement 
errors, dust removal from the raw data and other sources of 
noise. Hence, our observations j/y = uj{li, bj)+eij are blurred 
by some random error , the distribution of which may vary 
between different sky positions {li,bj). Particularly, we will 
see in the following that it is necessary to allow for a position- 
dependent noise Var[eij] — of y Therefore our astrophysical 



Vi] =V{p){k,bj) +e^y 

Note that this is not a noisy Fredholm eq. of the first kind as in 
(Q); however the suggested method in the last section transfers 
directly to the present setting. Note, that T' is a linear injective 
operator as long as p > due to our selection of the parametric 
model. 

Let 

uji){l,b)^V{p.9){Ub)- i9ee, 

and consider the transformed model 
where 

V^VU = {u^{l,b)}^^^ 
and 

T[w](r,5') = j J uj{l,b)T{{l',b'),{l,b))dldb. 

Specific models U will be discussed in the next sect. Follow- 
ing the approach in sect. Q we specify the smoothing integral 
operator T by defining the smoothing kernel as 

T((r, b'), {I, b)) = min{Z, I'} ■ min{6, 6'}; (/, 6), (/', b') G 

This amounts to a kind of cumulative smoothing, which down- 
weights small-scale features in the data, and emphasizes trends 
on large scales. 

Now, as a first step, we estimate g{l' , b') — T[uj\{l\ b') by 

n m 

m', - E E v^=T{{i', b'), {k,b,)) 
1=1 j=i 

and determine numerically the 'transformed' LSE 

i?T = argmin^ggll^ - T[cji,]|p (8) 

where 1 1 • | p denotes the usual L^-norm. Finally, the minimising 
value 

D^^\\g^T[p{p^^^)W 

is computed. Now the bootstrap algorithm in sect. ^ can be 
applied. Finally, we mention that for the minimization in (||) 
we have used the Marquardt-Levenberg algorithm (Press et al. 



1994). 



5.2. A parametric model of the Milky Way 

We will now investigate whether the functional form of the lu- 
minosity distribution of the MW as suggested by BGS provides 
a satisfactory fit to the COBE/DIRBE L-band data. This func- 
tional form is a superposition of a double-exponential disk with 
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a truncated power-law bulge 

'g-|z|/zo g-kl/^l 

p^{x, y,z) = d - { h a ■ 



r/rd 



disk 



2 I 2 

X +y , 



bulge/bar with cusp 

2 / \ 2 



(9) 



where the parameter i) can be devided into "structural" param- 
eters (zq, Zl, rii, b, Gjn, Ci o,c, d, q) that specify the functional 
form of the model and "geometrical" parameters that define the 
position of the sun in the coordinate system. The "geometrical" 
parameters are fixed in advance and are the position of the sun 
above the main plane of the MW, z© — 14pc, the distance of 
the sun from the galactic centre projected on the main plane, 
r0 = 8kpc and the bar angle, (fi — 20° (BGS). It is not feasible 
to estimate the cusp parameters ac — lOOpc and q — 1.8 from 
our data, because the available resolution is not sufficient. We 
use the same values as BGS. 

As a first step we will investigate graphically whether an in- 
homogeneous variance pattern has to be assumed, which is in- 
dicated by inhomgeneous squares of residuals. Fig. || shows 
the COBE/DIRBE L-band data and figure I the residuals r,,- = 



(^Uij — ^ ' with the model parameters i^bgs taken from 

BGS. Provided this model holds (approximately) true, as an 
important conclusion from Fig. ^ we find strong indication for 
inhomogeneous noise. Interestingly, towards the boundary of 
the observed part of the sky, the variability of the observations 
increases. Fig. ^ shows the difference between the model and 
the data including the algebraic sign of the difference. Note that 
the error distribution is obviously inhomogeneous, both in the 
logarithmic magnitude scale plotted in the figures and in a lin- 
ear scale. Further note there is a systematic dependence of the 
sign of the deviations on the position on the sky, whereas the 
model fits well in the central part of the observed part of the 
sky. This is indication that the MW disk shows deviations from 
an exponential z -dependence. 

We now determine the best-fit model parameter 9* by minimi- 
sation of D^. We use the parameters found by BGS as starting 
values for our minimisation algorithm. Due to the increasing 
noise towards the boundary of the observed part of the sky, we 
restrict the region of the surface brightness data used in the fit to 
the region |Z| < 60° and -20° <b< 10°. This is done to down- 
weight those parts of the sky where noninformative parts in the 
data are expected (see fig. Fig. ^ shows this data after it has 
been smoothed with the smoothing operator T. Note how much 
smoother the smoothed data appears compared to the original 
COBE/DIRBE L-band data. Our computational strategy con- 
sists of two steps. 



1 . Fitting the disk: In the first step we fit the disk parameters 

with fixed bulge parameters. 

2. Fitting the bulge/bar: In the second step we fix the disk re- 

lated parameters found in the first step (except for the nor- 
malisation parameter d) and fit the bulge/bar parameters 
and d. 



Table 5.2 shows our result for the best-fit model parameters t?* 
and the model parameters of BGS. As suggested in sect. ^ we 
have obtained our best-fit model parameters by minimisation 
of D^. Note that BGS have not used exactly the same region 
of the sky in their fit as we use here. Therefore, one has to take 
into account that differences between these model parameters 
may be partially due to different regions of the sky (data) used 
in the fit. We reduce this problem by redetermining the nor- 
malisations b, d of the model by BGS (keeping fixed their other 
parameters) with our algorithm, using the region of the sky se- 
lected above. The value of our proposed statistical quantity a* 
for the BGS model has been calculated for this modified ver- 
sion of their model parameters. 

Applying the bootstrap algorithm presented in sect. ^ to our 
model we find that a* = 0.86, which indicates no significant 
evidence against this model. For the parameters found in BGS 
a value a* = 0.80 is obtained which yields a slightly worse 
fit. Note that, at a first glance, this statement is in contradiction 
to the argument given by BGS in the last paragraph of their 
p366. They pointed out that a graphical inspection of residu- 
als suggests that for the model considered, some local regions 
of the sky seem to show systematic differences between their 
model and the observed data. As a conclusion we find that the 
proposed method is not capable of concluding that these lo- 
cal deviations between model and data are due to systematic 
deviations. As pointed out by the referee this might be due 
to lack of power of the proposed method, because an addi- 
tional smoothing step was proposed. Indeed, this corresponds 
to some theoretical results concerning the asymptotic efficiency 
of the proposed method (Munk & Ruymgaart (1999)). In fact, a 
more powerful method could result from chosing a data-driven 
smoothing operator T, similar to bandwidth selection in kernel 
regression. The main difficulty which arises is a different limit 
law compared to the case discussed in the present, where T is 
fixed. However, this is beyond the scope of this paper and will 
be an interesting topic for further research. 



It can be seen from Table 5.2 that the main difference between 



the two models is that our model has a lower disk scale height 
zq. The value a* was found to be slightly better for our new 
model compared to the BGS parameters. However, recall that 
we used only a part of the COBE/DIRBE L-band surface bright- 
ness data in our fit. 



6. Results and conclusions 

We have argued that classical measures of goodness of fit 
adopted from checking distributional assumptions can be mis- 
leading in the context of (inverse) regression. Particularly, an 
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Galactic longitude [deg] 



Fig. 3. COBE/DIRBE L-band data. Contours levels are given in magnitudes. Note that contour levels are only defined up to a 
common offset. 
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Fig. 4. Square difference between COBE/DIRBE L-band data logarithmic surface brightness (magnitudes) and the parametric 
model with the parameters from BGS. Contours levels are n x O.lmag^, n G JN. 




u [deg] 



Fig. 5. The smoothed COBE/DIRBE L-band data. This is the observed data in the sky region |; | < 60° and -20°<b< 10°, after 
application of the smoothing operator T. Contours indicated are for the natural logarithm of Tui. Note that the x and y axis are 
from |?|<60° and -20°<6<10°, due to the definition of T. 



inhomogeneous noise field can inflate the precision of com- Appendix A: Cliosing tlie smootliing kernel T. 

mon quantities. For this case, a new method was proposed 



for noisy Fredholm eqs. of the first kind by Munk & Ruym- 
gaart (1999). As an example for the application of the sug- 
gested algorithm, we use the problem of determining the lumi- 
nosity density in the MW from surface brightness data. From 
this w e have found that the parametric model in Binney et al. 
( 1997 ) can be improved slightly and gives a satisfactory fit of 
the COBE/DIRBE L-band data in a range of -20° < 6 < 10°. 



We mention that our procedure can also be performed with any 
other smoothing kernel T. This will also yield in general dif- 
ferent values of a*. In principle, a valid option is any injective 
Operator T. A good choice of T, however, is driven by various 
aspects, such as efficiency or simplicity. An extensive simula- 
tion study performed in Munk & Ruymgaart (1999), reveals 
the kernel T{u, v) = min(M, v) as a reasonable choice which 
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Parameters 


zo 




a 




d 


n 


C 




6 


a* 


"Our" 


0.162 


0.042 


0.27 


2.56 


0.41 


0.502 


0.59 


1.90 


306.1 


0.86 


"BGS" 


0.21 


0.042 


0.27 


2.5 


0.463 


0.5 


0.6 


1.9 


234.4 


0.80 



Table 1. Parameter values for our model and the model according to BGS and the statistical parameter a* Note that the 
parameters a and z\ are not fitted 




Galactic longitude [deg] 

Fig. 6. Difference between COBE/DIRBE L-band data logarithmic surface brightness (magnitudes) and the parametric model 
from BGS in magnitudes. Negatives values for the contour levels indicate that a model is too bright as compared to the data. The 
contour levels are chosen such that their squares are equivalent to « 0.1™, « 0.2'"and « 0.3™, to allow a direct comparison 
with fig. H 



yields a procedure capable to detect a broad range of deviations 
from U . See, however, the discussion in sec. 5. A particularly 
simple choice in noisy inverse models 

J/, = -KfiU) + 

can be achieved if T is the adjoint of K itself, provided K is a 
smoothing operator of the type 

K/(.)- j K{;v)f{v)dv. 

However, in our application this is not easy to calculate and 
will depend on constraints which force the particular model 
to be identifiable. 
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